fix: converter hf now handles byte characters. Closes #188 #189

antoine-sac · 2025-03-22T20:49:08Z

The converter-tokenizer-hf is now aware of byte characters (such as "<0x0A>" and parses them correctly as the actual character (such as a newline "\n".

This is useful for mistral, tinyllama, and others using the same tokenizing method.

See ggml-org/llama.cpp#4622 for more context.

Fix #188.

See ggml-org/llama.cpp#4622 for context

b4rtaz · 2025-03-22T22:25:04Z

Thanks @antoine-sac!

fix: converter hf now handles byte characters. Closes b4rtaz#188

30b252f

See ggml-org/llama.cpp#4622 for context

b4rtaz merged commit ec2cb7f into b4rtaz:main Mar 22, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: converter hf now handles byte characters. Closes #188 #189

fix: converter hf now handles byte characters. Closes #188 #189

antoine-sac commented Mar 22, 2025

b4rtaz commented Mar 22, 2025

fix: converter hf now handles byte characters. Closes #188 #189

fix: converter hf now handles byte characters. Closes #188 #189

Conversation

antoine-sac commented Mar 22, 2025

b4rtaz commented Mar 22, 2025